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ABSTRACT 

We present BLOBCAT, new source extraction software that utilises the flood fill algo- 
rithm to detect and catalogue blobs, or islands of pixels representing sources, in two- 
dimensional astronomical images. The software is designed to process radio- wavelength 
images of both Stokes / intensity and linear polarization, the latter formed through the 
quadrature sum of Stokes Q and U intensities or as a byproduct of rotation measure 
synthesis. We discuss an objective, automated method by which estimates of position- 
dependent background root-mean-square noise may be obtained and incorporated into 
BLDBCAT's analysis. We derive and implement within BLOBCAT corrections for two sys- 
tematic biases to enable the flood fill algorithm to accurately measure flux densities 
for Gaussian sources. We discuss the treatment of non-Gaussian sources in light of 
these corrections. We perform simulations to validate the flux density and positional 
measurement performance of BLOBCAT, and we benchmark the results against those 
of a standard Gaussian fitting task. We demonstrate that BLOBCAT exhibits accurate 
measurement performance in total intensity and, in particular, linear polarization. 
BLOBCAT is particularly suited to the analysis of large survey data. 
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1 INTRODUCTION 

In radio astronomy image analysis, for which approxima- 
tions of Gaussian noise statistics and Gaussian source mor- 
phologies are suitable, much attention has been pai d to least 
squar es 2D elliptical Gaussian fitting routines (e.g. ICondonl 
1 1997h . Such routines, for exam ple those im plemented within 
the M IRIAD (|Sault et al.lll995l ) and AIPS dBridle fc Greisenl 
1 1994 ) packages, are appropriate for source extraction when 
fitting parameters have been carefully inspected or con- 
strained. However, when left unconstrained, the accuracy 
of these Gaussian fits may become degraded, requiring sig- 
nificant manual inspection overheads to identify poor fits 
and ensure high quality source extraction. Gaussian fitting 
routines may therefore be unsuited to the general analysis 
of large survey data. 

In this work we seek to develop a robust alternative 
to Gaussian fitting by utilizing the flood fill algorithm 
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l|Liebermanll 19781 : iFishkin fc Barskvll 19851 ) . In particular, we 
seek to develop a source extraction procedure that incor- 
porates an accurate, objective, and automated method of 
background root-mean-square (rms) noise estimation, and 
to develop the first accurate method of source extraction for 
resolved sources in linear polarization. Additional factors 
motivating this work are described as follows. 



First, a number of large radio surveys are planned 
for the near future, capitalising on upcoming new 
or substantially upg raded facilities such as ASKAP 



Johnston et all |200S| ; iDeboer et all 120091), M EERKAT 
Jonasl |2009|). LOFAR dRottgering et all BoioT) ALMA 
ilWootten fc Thompson! 120091 : iHills et all l20ld) LWA 
ifallingson et all 120091), WSRT (|Oosterloo et alj 120091 ). 
EVLA (|Perlev et al.ll201ll ). and many others including VLBI 
networks and epoch of reionisation instruments. With these 
facilities will come a number of large surveys in both to- 
tal intensity and linear polarization, for example EMU 
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( Norris et al.1 l201lh WODAMI, MIGHT Erfl, POSSUM 
ibaensler et all l2010l ). and GALFACTS jTavlor fc Salted 
I2010T I. The ability to catalogue objects within the large im- 
ages produced by these surveys, with as little manual in- 
tervention as possible, will be key to maximising scientific 
output. We seek to develop a robust, automated method of 
source extraction that requires only the most complex of 
sources to be manually inspected. 

Second, recent polarimetric studies have indicated an 
increase in the fra ctional polarization of faint extragalactic 
radio sources (e.g . [Taylor et all 120071: Subrahmanvan et al.l 
l20ld ; iGrant et al.ll2oTd ; IShi et al.ll2010l ), which are difficult 
to re concile with population modelling (jO'Sullivan et all 
2008). We seek here to subject the process of polarization 
measurement to close scrutiny, and to provide the commu- 
nity with a measurement tool that has been assessed within 
a controlled testing environment. 

And third, the flood fill algorithm underpins a number 
of existing source extraction routine s, such as those avail- 
able in the CUPIl fl (e.g. CLUMPFIHD IWilliams et alj Il994h 
and SExtractor (|Bertin fc Arnoutsl [l99dT packages. How- 
ever, these routines are unable to measure flux densities 
without performing subsequent Gaussian (or similar) source 
fitting. Alternatively, the flood fill algorithm has been used 
without the subsequent least squares fitting step for the 
customised an alysis of extended, n on-Gaussian sources in 
total intensity ([Murphy et al.|[2007l ) and linear polarization 
|Heald et al.ll2009l ). However, the raw flood fill algorithm as 
implemented in these works is not suitable for use with com- 
pact (unresolved or resolved Gaussian) sources, as their flux 
density measurements suffer from two significant systematic 
biases. In this work we describe how to correct for these 
biases in a robust manner, so as to enable the flood fill ap- 
proach to handle both Gaussian and non-Gaussian sources. 

We have implemented these bias corrections within a 
new flood fill program called BLOBCAT, which catalogues 
blobs in astronomical images. We use the term blob in an 
image-processing sense to represent an island of agglomer- 
ated pixels within a sea of noise, and to indicate that its 
properties are not inferred by fitting (e.g. least squares). We 
have designed BLOBCAT for use in radio astronomy, attempt- 
ing to produce a program capable of encapsulating the entire 
measurement process between observational image and out- 
put catalogue. 

This paper is organised as follows. In §[2] we describe the 
algorithms implemented within BLOBCAT, detailing required 
program inputs, including the minimal set required for oper- 
ation, and output data products. In §[3] we assess BLOBCAT's 
peak surface brightness (SB), integrated SB, and positional 
measurement performance. We investigate the program's 
ability to handle unresolved, resolved, and complex (non- 
Gaussian) sources in images of total intensity (Stokes 7) 
and linear polarization (L or I/ RM ; these terms are defined 
m §C3>, an( A discuss issues regarding polarization bias. For 
comparison, we also assess the performance of a standard 
Gaussian fitting routine. In § [4] we discuss two examples 
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Figure 1. Overview of BLQBCAT. 



of post-processing that may be required to make full use 
of BLOBCAT's output catalogue; these are particularly rele- 
vant for data containing extended non-Gaussian, or multiple 
blended Gaussian, sources. In § [5] we present our summary 
and conclusions. 



2 HOW BLOBCAT WORKS 

BLOBCAT is written in the scripting language Python. The 
program is design ed to catalogue blo bs in a two-dimensional 
(2D) input FITS ([Pence et al-hoid ) image of SB. To isolate 
blobs and determine their properties, BLOBCAT requires an 
estimate of the background rms noise and degree of band- 
width smearing at each spatial position (pixel) within the SB 
image. These two diagnostics may be provided to BLOBCAT 
as either uniform (spatially-invariant) values or, more gen- 
erally, as 2D input FITS images that encode the more re- 
alistic scenario whereby noise and smearing properties vary 
with spatial position over the SB image. 

An overview of BLOBCAT is presented in Fig. [T] In the 
following sections we describe the input images and their re- 
quirements (§ 12. 1[) . the core flood fill algorithm used to iso- 
late blobs (§ I2.2jl . the key morphological assumption (§ I2.3[l 
and bias corrections (§ 12. 4[) applied to extract blob proper- 
ties, the input arguments required to run BLOBCAT (§ 12.51) . 
the output catalogue (§ 12. 6p . and the optional program out- 
puts (§ UTTJ. 



http: / /www. astron.nl/radio-observatory/apcrtif-eoi-abstracts- 
and-contact-information 

2 Van der Heyden K., Jarvis M. J., 2010, MIGHTEE proposal to 
MEERKAT 

3 http://starlink.jach.hawaii.edu/starlink/CUPID 



2.1 Input Images 

BLOBCAT requires up to three input FITS images, as outlined 
in Fig. [T] For flexibility, the images of background rms noise 
and bandwidth smearing are optional, and may instead be 
replaced by spatially-invariant input values. 
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2.1.1 Surface Brightness 

BLOBCAT is designed to analyse blobs with positive SB. To 
detect negative blobs, the input SB image must be inverted 
before use. In this paper we focus on the analysis of blobs in 
images of total intensity and linear polarization (L or Lrm J - 
BLOBCAT may also be used to analyse images of Stokes Q, U, 
and V intensities, though we note that resolved sources ex- 
hibiting both positive and negative SB in these images will 
be incorrectly handled; we do not attempt to address the 
analysis of such sources here. We assume that blobs of in- 
terest in total intensity and linear polarization may be char- 
acterised by 2D elliptical Gaussians, though we do consider 
the treatment of non-Gaussian blobs later in § 14.21 Image 
pre-processing techniques to remove wide-spread extended 
features prior to t he analys i s of more compact sources may 
be required (e.g. iRudnickl |2002l : iRudnick fe Brownl 12009 : 



lOppermann et aL1l2011 



We assume that images of L RM are produced follow- 
ing the application of rota tion measure (R M) synthesis 
jBrentiens fc de Bruvr]|2005l , and RMCLEAN l|Heald et al.1 
l2009h such that for each spatial pixel located at pixel coor- 
dinate (x,y), the polarized emission is obtained by taking 
the peak value in the cleaned Faraday dispersion function, 
namely 



L RM (x, y) = ma,x(\\F c 



\x,y,(j))\\) 



(1) 



where </> m Faraday depth. We note that this definition of 
L RM assumes Faraday spectra along each pixel sightline con- 
sisting of no more than a single unresolved Faraday compo- 
nent (additional components will be ignored); analysis with 
more advanced models of L RM are beyond the scope of this 
work. Anal ysis involving equa tion (fTJ is demonstrated, for 
example, bv lHeald et al.l l|2009h and Hales et al. (in prepara- 
tion). Alternatively, images of standard linear polarization, 



L(x,y) = ^Q(x, y y + U(x,y)2, (2) 

may b e used. See lLeahv fe Ferninil l| 19891) and Vaillancourtl 
(2006) for statistical properties of L, and lHales et alJ ( 20121 ) 
for statistical properties of both L and L EM . For simplicity 
in subsequent discussion, we neglect the pixel coordinate 
notation (x, y) affixed to all spatially variable parameters, 
unless required for clarity. 

2.1.2 Background RMS Noise 

If position-dependent rather than spatially-invariant blob 
detection thresholds are required, then a background rms 
noise image must be specified. The user is required to inde- 
pendently construct a suitable noise map for their SB im- 
age, for example using the rms estimatio n algorithm imple- 
mented within the SE xtractor package l)Bertin fc Arnoutsl 
ll996l : lHolwerdall2005l ). 

Despite having been originally developed for the analy- 
sis of optical photographic plate and CCD data, SExtractor 
has been found to be reliable when generating noise map s 
from radio data l|Bondi et all 12003 : iHuvnh et ail l2005f) . 
SExtractor determines the rms noise at each spatial pixel 
in an image by extracting the distribution of pixel values 
within a local mesh, iteratively clipping the most deviant 
values until convergence is reached at ±3u about the me- 
dian. The choice of mesh size (in pixel 2 ) is very important. 



If it is too small, the local rms estimate may be biased due 
to lack of statistically independent measurements or over- 
estimated due to the presence of real sources. If it is too 
large, any true small-scale variations in local rms noise may 
be washed out. At least N b — 80 independent resolution 
elements (beams) per mesh area are required in order to 
reduce the uncertainty in estimates of local rms noise to be- 
low {[1 + 0.75/ (N b - 1)] 2 [1 - A^ 1 ] - 1}° 5 = 8% (using an 
approximation to the uncertainty o f the standard error es - 
timator, suitable for N b > 10; p. 63, |johnson fc Kot3ll970l ). 
The mesh area, H me &h> may be calculated according to 



a 



where 



n b = 



41n2 



e maj e„ 



(3) 



(4) 



is the beam volume for a 2D elliptical Gaussian with full- 
width at half-maximum (FWHM) along the major and mi- 
nor axes given by ma j and O min , respectively, and where d = 
7r/\/T2 is the densest lattice packing for congruent copies 
of an y convex shape (e.g. circles, ellipses; IPach fc Agarwaj 
Il995l) . It is customary in physical sciences to treat rms noisejj 
values, such as those reported by SExtractor, as standard 
errors in order to boost noise estimates in regions where 
extended non-signal features are present; namely by defin- 
ing that <t z = (z rmB ) SExtractor- In other words, by using rms 
noise estimates to calculate local signal-to-noise ratio (SNR) 
thresholds for blob detection, it is possible to take into ac- 
count not only local variations in image sensitivity, but also 
the possible presence of DC offsets due to artefacts (e.g. side- 
lobes). For this reason we recommend the method of using 
SExtractor or a similar package to estimate noise over the 
method of simply estimating it from, say, Stokes V because 
it can take into account features in the data that may be 
missed by more theoretically motivated expectations. The 
procedure described above, incorporating equation (]3}, may 
be easily automated to provide objective estimates of rms 
noise for any noise-dominated image. 

Finally, we note that the SExtractor procedure above is 
suitable for determining the rms noise in images of Stokes I, 
Q, U, or V, but not L RM (nor L). Instead, to determine <t rm 
at each spatial location in L RM , SExtractor should be run on 
each constituent Qi and Ui image in each i'th of T frequency 
channels to obtain a Qii and o Uti . These in turn may then be 
combined using weighted least squares as l|Hales et al,ll2012h 



0" R M — 



^ 0.2 min (CTS.i.CTS-.i) +0.8max [a^^, a-fr.i) 



(5) 

where the term £ represents th e correlation correc tion factor 
defined by equation (23) from lHales et al.1 l)2012h . 



2.1.3 Bandwidth Smearing 

If corrections for position-dependent bandwidth smearing 
(chromatic aberration) are required, then an image detail- 
ing the degree of smearing at any location within the SB 
image must be specified. Bandwidth smearing is due to 



4 The definition of rms noise is Zr ma 
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the finite bandwidth of frequency channels, resulting in a 
radially-dependent convolution (smearing) that worsens as 
a function of positional offset from the phase tracking centre 
of a single-pointed rad io observation l|Condon et al.l 1 19981 : 
iBridle fc Schwab! 1 19991) . The effect is to decrease the peak 
SB and to increase the observed size of sources without af- 
fecting their integrated SB. Bandwidth smearing needs to 
be carefully accounted for in mosaics consisting of multiple 
overlapped pointings. This is because any location in a mo- 
saicked image, even one situated over a pointing centre, may 
include multiple contributions from adja cent pointings i n 
which bandwidth smearing is sig nificant (|lbar et all [2009). 
The bandwidth smearing image input to BLOBCAT should 
map out the ratio between the observed smeared peak SB, 
S p , and the true unsmeared peak SB, S B ws , for all spatial 
positions within the SB image (using notation consistent 
with that introduced later in this work). We denote the lo- 
cal degree of bandwidth smearing as 



possible to conserve both angles and areas when mapping 
portions of a sphere to a plane. 



S p 



« 1) 



(6) 



2.1.4 General Requirements 

All images input to BLOBCAT must have the same dimensions 
and be located on the same pixel grid; for cataloguing pur- 
poses we require that the primary image world coordinate 
system is expressed in equatorial coordinates (RA-Dec). In 
order to measure fitted Gaussian peaks to within 1%, at least 
5 pixels per resolution element FWHM should be present 
(see Appendix A) . 

BLOBCAT does not calculate the Jacobian of the transfor- 
mation between proje ction plane coordinates and native lon- 
gitude and latitude l|Calabretta fc Greisenl 2002). Instead, 
BLOBCAT requires that input images are gridded to an equal- 
area projection, so as to ensure that sky area per pixel is pre- 
served. BLOBCAT supports both zenithal equal-area (ZEA) pro- 
jection (the premier scheme for a hemisphere) and Hammer- 
Aitoff (AIT) equal-area projection (suitable for all-sky im- 
ages) l|Calabretta fe Grei scn 2002). Failure to use an equal- 
area projection will lead to systematic biases in BLOBCAT's 
extracted flux densities and visibility area (sky density) cal- 
culations (see § 12. 6[) . However, there are two common sit- 
uations where this equal-area requirement may be relaxed. 
The first is when measuring flux densities for unresolved 
sources by obtaining their peak pixel or fitted peak value 
(cf. Appendix A). The second involves the use of images with 
non-equal-area project ions; for exam ple, the North-celestial- 
pole (NCP) projection |Greisen|ll983l ). For such images, flux 
density measurements for resolved sources, which require in- 
tegration over SB (i.e. over pixels), will only be suitable for 
sources situated close to the image reference point wher e 
distortion effects are minimal (|Calabretta fc Greisenl [20021 ). 
To enable such analysis, BLOBCAT also supports images in 
NCP projection or the more general slant orthographic (SIN) 
projection. Regridding of input images to one of the ZEA, 
AIT, NCP, or SIN projection schemes may be computed us- 
ing, for example, the WCSLIB3 package. Finally, we remark 
that equal-area projections do not preserve shape; it is not 
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2.2 Flood Fill Algorithm 



BLOBCAT uses the flood 
rithm dLiebermanl 



197; 
2008) 



fill, or thresholding , algo- 
iFishkin fc Barskvl Il985l : 
to isolate individual blobs 



ISonka. Hlavac fc Bovkj 
(islands) of pixels from within a SNR map. The SNR map 
is formed by taking the pixel-by-pixel ratio between the 
input SB and background rms noise images. In units of 
dimensionless SNR, we denote the threshold for detecting 
blobs as Td and the cut-off threshold for flooding down 
to as Tf. By applying thresholds in the SNR map rather 
than the SB image, local variations in sensitivity can be 
accommodated. We do not take into account bandwidth 
smearing at this initial flooding stage (see § l2.6l below). We 
have implemented the hig hly optimised flood fill algorithm 
from iMurphv et al.l (|2007l ) within BLOBCAT, which operates 
as follows. 

(i) Locate all pixels in the SNR map that have value ^ Td, 
including those pixels that would meet this detection thresh- 
old if it were not for pixellation attenuation (see Appendix 
A and comments below). 

(ii) Form blobs about each of these pixels by 'flooding' 
adjacent pixels that have value ^ Tf. 

(iii) For each isolated blob, perform bias corrections 
(§ 12. 4p and catalogue properties (§ 12. 6[) . 

We denote the peak SB observed within the peak pixel for 
each blob by S° BS (with units Jy beam -1 ), and the resulting 
observed peak SNR by A OBS = S° BS /a. To minimise the 
attenuating effect of pixellation on Sp BS , BLOBCAT calculates 
a fitted peak SB for each blob by applying a 2D parabolic 
fit to a 3 x 3 pixel array about the raw peak, as described 
in Appendix A. We denote this fitted peak by £> FIT , and the 
resulting fitted peak SNR by A FIT = S* IT /a. We denote 
measurements of integrated SB by S^ s (with units Jy), 
which are obtained for each blob by summing their flooded 
pixel intensities and dividing by the beam volume (fib). 

BLOBCAT attempts to perform its internal calculations, 
as described in the following sections, using the fitted peak 
quantities S FIT and A FIT . However, if S FIT < S° BS , as may 
occur for heavily pixellated images (namely, for small values 
of N a and N$ as defined in Appendix A), then for consis- 
tency BLOBCAT sets S FIT = S° BS (and thus yl FIT = S° BS /a) 
to ensure that blobs with Sp IT < T d yet S° BS > T d are not 
unfairly rejected from the output catalogue. For notational 
simplicity in subsequent discussion we will use the super- 
script 0BS to refer to both unfitted and fitted peak quanti- 
ties; we will not distinguish between 0BS and FIT quantities 
unless required for clarity. 

We now turn to the key morphological assumption used 
to infer physical properties of these isolated blobs from their 
raw observed measurements. 



2.3 Blob Morphology Assumption 

In aperture synthesis imaging, individual resolution ele- 
ments are described by the morphology of the dirty beam 
(the Fourier transform of the sampling distribution). Pro- 
vided that the central core of the dirty beam can be suitably 
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Figure 2. Flood fill algorithm applied to a noise-frcc 2D ellip- 
tical Gaussian blob with peak SNR A. The detection threshold 
is T,j. The blob is flooded from the peak down to the detection 
threshold Tj . Flood fill can only measure a fraction of the blob's 
total volume, r) (equation I I16II ), as indicated by the shading. The 
width of the blob at A/2 (the FWHM) is ip. 



approximated by an elliptical Gaussian, then individual res- 
olution elements in the resulting images can be described by 
2D elliptical Gaussians. In other words, point sources will 
appear as Gaussians in an image. 

In BLOBCAT we assume that each isolated blob is de- 
scribed by a 2D elliptical Gaussian characterised by a peak 
SNR, A, and representative major and minor FWHMs ip r 
and ip s , respectively (representative because these FWHMs 
are never individually measured, as we discuss shortly). In 
§ l3.3l arid § l4.2l we discuss situations where this assumption of 
Gaussian blob morphology is poor. The general equation for 
a 2D elliptical Gaussian, located at the origin of an arbitrary 
coordinate frame (r, s) that is aligned with the major/minor 
axes, is given by 



f(r,s) = Aexp 



This equation is valid for Gaussian blobs in noise-free images 
of either total intensity or linear polarization. The volume 
of this 2D Gaussian is 

Qg = 41n~2^ s ■ ^ 
This general setup, including detection thresholds as denned 
in § [221 is shown in Fig. [2] 



2.4 Blob Bias Corrections 

BLOBCAT applies two important corrections to each isolated 
Gaussian blob in order to prevent systematic biases from 
affecting its peak and integrated SB measurements. These 
corrections account for: 

(i) The positive peak surface brightness bias exhibited by 
S° BS for resolved blobs; and 

(ii) The negative integrated surface brightness bias exhib- 
ited by S^ s caused by the limited blob volume accessible 
to flooding before the cut-off threshold T/ is reached. 




0.5S 



Figure 3. Idealised representation of the positive bias encoun- 
tered when measuring the peak SB of a resolved Gaussian blob 
embedded in noise. Shown are two resolved Gaussian blobs, each 
with (true) peak SB S p and 7 resolution elements per FWHM. 
For visual and conceptual simplicity, noise is represented by a 
sine wave and it is assumed that a large number of pixels pop- 
ulate each resolution element (such that pixellation effects may 
be ignored; i.e. Sp IT = S° BS ). Two equally likely noise super- 
positions are shown. The left blob encounters a positive noise 
contribution to its peak SB while the right blob encounters a 
negative noise contribution (trough). In both cases the observed 
peak SB overestimates the true peak SB, leading to a systematic 
positive bias for resolved sources. BLOBCAT corrects for this bias 
with equation ||14|I . as parameterised by the area sliced at Act be- 
low the observed peak. If A is too small, the bias correction itself 
may become biased due to volatility in the small area sliced, as 
illustrated. 



correct for it, we first examine the following experiment. 
Consider for simplicity that blobs are represented by tophat 
functions rather than 2D elliptical Gaussians, that images 
are produced with one pixel per resolution element, and that 
noise is Gaussian in character. Noise is always resolved on 
the same spatial scale as unresolved sources. Therefore, the 
peak SB of an unresolved blob, here observed as the mag- 
nitude of a single pixel, will be affected by a single noise 
sample which may be positive or negative. For an ensem- 
ble of such unresolved blobs, each with identical true peak 
SB but different noise sample, the average observed peak 
SB will be an unbiased tracer of the true peak SB. Now 
consider a resolved tophat blob, over which M independent 
noise samples will be present. The observed peak SB of this 
resolved blob will depend on the maximum of M indepen- 
dent noise samples, rather than M = 1 for an unresolved 
blob. Thus the more resolved the blob becomes, the larger 
M becomes, and the less likely it is that a negative noise 
sample will be selected as the observed peak SB. The aver- 
age observed peak SB for an ensemble of identically resolved 
blobs will therefore be positively biased from its true value. 
Before returning to 2D elliptical Gaussians, we will describe 
how to correct for this positive bias in the context of order 
statistics using the simpler tophat blob morphology. 

For a sample of M independent and identically- 
distributed variates Xi , X 2 , . . . , X M ordered such that 
X(i) < X( 2 ) < . . . < X ( M ) (using notation X, for unordered 
variates and Xyj for ordered variates), then is known 
as the fc'th order statistic and X( M ) = max(X,). If X has 
density function f(X) an d distribution function F(X), then 
iDavid fc Nagaraial (2003) give the density function for X m 



2.4-1 Peak Surface Brightness Bias 

An illustration outlining the need for the first correction is 
presented in Fig. [3] To understand this bias and how to 
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f(X) 



k) 
- F(X)] 



(9) 
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Figure 4. Expectation value in noise units of a for the largest 
of M independent Gaussian variates (equation 112} ). The expec- 
tation value is for M = 1. A polynomial fit to the curve is given 
by equation JHJ. 



The density function for the maximum of M independent 
Gaussian variates with variance a is obtained from equa- 
tion ((9| by setting k — M, giving 



/(*( 



M 

1 

2 



exp 



1 + erf 



2a 2 
X 
as/2 



where erf is the error function defined by 

erffa) = —= e at . 
V 71 " Jo 

The expectation value for equation (|10[) is given by 

/>00 

E[/(X (M) )]=/ f(X iM) )dX, 



(10) 



(11) 



(12) 



which is plotted for a range of M samples in Fig. 2] 
Equation ()12|) represents the average positive bias existing 
between measurements of observed peak SB and true peak 
SB for a tophat blob. Given measurement of M, namely the 
number of independent resolution elements present over the 
extent of the blob, an estimate for the bias can be obtained. 
The bias is most pronounced for low SNR resolved blobs; 
for a tophat blob of extent ~ 4 resolution elements, the bias 
for a 5cr blob is ~ 1.0cr/5cr = 20% (see Fig. E}. 

We now return to the scenario whereby blobs are as- 
sumed to represent 2D elliptical Gaussians. Instead of ob- 
taining M from the full spatial extent of a tophat blob, M 
needs to be estimated from the observable properties of a 
2D Gaussian embedded in noise. In BLOBCAT we estimate M 
by approximating that the relevant number of independent 
resolution elements contributing to the positive bias can be 
extracted from the cross-sectional area contained within a 
slice of constant-SNR at a few a below the peak, as parame- 
terised by A in Fig. BLOBCAT measures the cross-sectional 
area for each blob at SNR = (A OBS ~ A), which we denote 
H\, by flooding from the peak to this threshold and simply 
counting the number of pixels present. M is then estimated 
using (cf. equation @) 



M 



d_ 



(13) 



To determine the positive bias between S'° BS and S v for 
resolved blobs, BLOBCAT uses the following fifth-order poly- 
nomial fit to the curve in Fig. [4] to form a simple lookup 



table (rather than solving for equation Ijl2| l). 

5 

M = 1 + ^2 a^ , 



(14) 



where 



E [/(*(«>)] 

gOBS j ^OBS 



S n 



A 



(15) 



and where ai = 0.89, = 0.27, 03 = 3.75, 04 = —3.67, and 
a 5 = 1.61. 

To illustrate the constraints on selecting A, imagine 
trying to correct the raw observed peak SB for a resolved 
Gaussian blob, detected with peak SNR = 50, by arbitrar- 
ily defining that the relevant spatial extent be measured at 
A = 20. Choosing M in this way will overestimate the peak's 
positive bias, because not even a 10ct noise spike located at 
the SNR = 30 contour of the blob could be mistaken for the 
true peak. Alternatively, choosing too small a value of A will 
not only underestimate the peak bias in the opposite manner 
to above, but also render M vulnerable to additional nega- 
tive bias due to H\ being fooled (limited in spatial extent) 
by noise troughs near the blob's peak. 

We performed simulations to empirically determine the 
most suitable range of values for A. We found that choos- 
ing A = 3.5 best corrected for the positive bias exhibited by 
Sp BS for resolved blobs in images of either total intensity or 
linear polarization [L or L RM ). We discuss the simulations 
used to determine this optimum A, as well as the general per- 
formance of the peak SB bias correction from equation (|14[) , 
in §|3 

2.4.2 Integrated Surface Brightness Bias 

To prevent the flood fill algorithm from cascading into noise 
features adjacent to real blobs, flooding is terminated at the 
cut-off threshold, Tf. The integrated SB measured for each 
blob, Sf^ s , therefore underestimates the true integrated SB, 
Sint, because only a limited fraction of the total volume for 
each blob is ever directly accessed. We denote this fraction 
77, as indicated in Fig. [2] 

By integrating the volume flooded between A (true peak 
SNR) and Tf for a 2D elliptical Gaussian blob, and dividing 
this result by the total volume of the blob (equation (JSJ), 
the fraction of flooded volume r\ is found to be 



V ■ 



erf 



In 



(16) 



BLOBCAT corrects the observed integrated SB for each de- 
tected blob (regardless of blob dimension) by simply divid- 
ing by 77, namely 



Tj 



(17) 



It is important to note that A in equation (|16[) is the true 
peak SNR. For resolved blobs, the peak bias correction from 
equation (|14[) needs to be applied first, so as to debias the 
observed peak SNR, yl OBS , and return an estimate for the 
unbiased peak SNR, A. The effect of using uncorrected peak 
SNRs for resolved sources in equation (fT6)l is demonstrated 
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The choice of Tf affects the maximum volume that can 
be flooded within a faint blob. So as to recommend a min- 
imum value, we performed simulations of integrated SB re- 
covery for 2D elliptical Gaussian blobs embedded within 
images of total intensity and linear polarization; the de- 
tails of these simulations are discussed in § [3] We incre- 
mentally reduced Tf in these simulations, seeking a balance 
between the measurement of as much volume as possible 
within faint blobs, and the need to avoid bias from poten- 
tial over-flooding of neighbouring noise features. 

In total intensity images for blobs as faint as A — 5, 
we found that a cut-off threshold of Tf — 2.6 was re- 
quired in order to robustly flood as many true blob pixels as 
possible whilst avoiding over-flooding of adjacent non-blob 
(noise) pixels. In linear polarization images (L or L RM ), non- 
Gaussian n oise statistics typically limit detectio n thresholds 
to T d > 6 (|Vaillancourtll200d ; lHaies et alj|2012h . These im- 
ages thus require higher flooding thresholds than those for 
total intensity; we note that a comparison between the av- 
erage cross-sectional profile of a Gaussian blob embedded 
in images exhibiting Gaussia n, L, and L RM statistics is pre- 
sented bv lHales et al.l (|2012T l. In images of Lrm for blobs as 
faint as A = 6, we found that a cut-off threshold of Tf = 4.0 
was suitable. We note that this value of Tf is dependent on 
the observational setup used to produce Lrm- To determine 
an equivalent value of Tf for any L or L RM image, a cut-off 
with equal statistical significance t o our suggested Tf = 4.0 
value should be calculated (e.g. see lHales et al.l F2012'). 

For a detection threshold of Td — 5 in an image of 
total intensity, equation (|16p with Tf = 2.6 implies that the 
maximum correction factor for any blob is I/77 < 1.8. In 
linear polarization, for a detection threshold of Td ~ 6 and 
Tf — 4.0, the maximum correction factor is 1/rj < 2.5. 



2.5 Program Inputs 

If accurate error estimates are not immediately required, 
BLOBCAT does not requires many inputs to run. Preliminary 
analysis can be performed on a single input SB image by 
specifying three parameters: a background rms noise value 
(simply so that SNRs can be computed at any spatial loca- 
tion within the image), a blob detection SNR threshold (Td), 
and a cutoff SNR threshold for flooding (Tf). However, to 
make full use of the output catalogue, particularly errors, 
additional input parameters are required. For completeness, 
we list all BLOBCAT input arguments in Appendix B. 



2.6 Output Catalogue 

BLOBCAT produces an output catalogue containing 41 entries 
for each detected blob. In this section we list and define 
these entries, which include final measurements of peak and 
integrated SB, corrected for bandwidth smearing and clean 
bias, errors, and the 'visibility' area for each blob. The cat- 
alogue entries, some of which require various BLOBCAT input 
arguments to be specified (see Appendix B), are as follows. 

Column 1: ID 

Blob identification number, ordered by decreasing observed 
peak SNR (see Column 26). 

Column 2: npix 
Number of flooded pixels comprising blob. 



Columns 3-4: x_p, y_p 
RA and Dec of peak pixel in pixel coordinates. 

Columns 5-6: RA_p, Dec_p 
RA and Dec of peak pixel in degrees. 

Column 7: RA_p_err 
Total position error in RA of peak pixel, which we define as 



(18) 



The first term, cj 2 a cal , represents the positional uncertainty 
of the phase calibrator, for example with reference to the 
International Celestial Reference Frame, that was used to 
produce the SB image. The second term, <ff rame , repre- 
sents the positional uncertainty of the image frame about 
the (assumed) position of the phase calibrator. Given that 
image positional errors correspond to Fourier-plane phase 
errors, of ramc may be estimated by measuring ctsem, the 
standard error of the mean (SEM) of the variation in 
the phase corrections result ing from phase self-calibratiorjf] 
l|Cornwell fc Fom alont 199^). By estimating the fraction of 
a resolution element by which positions may be in error as 
csem/180°, BLOBCAT estimates the frame error as 

1 (lg) 



@ot, frame 



y/2 180' 



-e 



where the factor of y2 projects the 2D SEM along one of two 
orthogonal axes, and where <d a is the projected resolution 
along the RA-axis. a is given by 



©moj ©mm 



(e cosx) + (© sinx) 



(20) 



where \ is the position angle of the major axis East of North. 
The third term, rj 2 a blob , encapsulates positional error due to 
the significance of the blob detection, which we define for 
reasons described later in § 13.1.31 and § 13.2.31 as 



Oa.blob 



1AA 



e 



(21) 



°S = \/04,cal + ^Iframo + ^f.blob » ( 22 ) 



Column 8: Dec_p_err 
Total position error in Dec of peak pixel, which we define in 
a similar manner to equation (|18|) as 



where 



O"i,blob 



1 ""SEN 

1 e, 



1AA 



(23) 
(24) 



and where the projected resolution along the Dec-axis is 
given by 



Qs = 



J (®maj sinx) 2 + (@min cosx) 2 



(25) 



6 Note that regardless of whether or not self-calibration phase 
corrections are applied to the visibility (Fourier) data prior to fi- 
nal imaging (i.e. it is possible to calculate the required phase cor- 
rections without applying them), the systematic positional offset 
between the image frame and the phase calibrator c an be char- 
acter ised by the SEM of the phase corrections (e.g. lHales et all 
l200Sh . 
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Columns 9-10: x_c, y_c 
RA and Dec of area (unweighted) centroid in pixel coordi- 
nates, 



npix 



(26) 



where Xj = (xi,yi) £ blob. 

Columns 11-12: RA_c, Dec_c 
RA and Dec of unweighted centroid in degrees. 

Column 13: cFlag 
Centroid flag. If (x c ,y c ) is located within a flooded pixel, 
then cFlag = 1; otherwise cFlag = 0. 

Columns 14-15: x_wc, y_wc 
RA and Dec of SNR-weighted centroid in pixel coordinates, 



, Umc) — 



ESx^ OBS (x.) 



(27) 



E ^ 1 M°BS(x < ) 

Columns 16-17: RA_wc, Dec_wc 
RA and Dec of SNR-weighted centroid in degrees. 

Column 18: wcFlag 
Weighted centroid flag. If (x mc ,y wc ) is located within a 
flooded pixel, then wcFlag = 1; otherwise wcFlag = 0. If 
wcFlag = 1, then RA_wc and Dec_wc from Columns 16-17 
above are the formal position of the blob. If wcFlag = 0, the 
blob is likely to be significantly non-Gaussian; the weighted- 
centroid position may not be suitable for formal cataloguing 
purposes. Manual inspection, or formal cataloguing using 
the raw peak pixel or area centroid positions, may be re- 
quired. 

Columns 19-22: x_min, x_max, y_min, y_max 
Minimum and maximum pixel coordinate in RA (x) and Dec 
(y) within blob. 

Column 23: rms 
Rms noise, a, at position of peak pixel. 

Column 24: BWScorr 
Bandwidth smearing correction, 1/vo (from equation (©). 

Column 25: M 

Number of independent resolution elements from equa- 
tion (|13p . M is used in equation (|14|) to correct for the pos- 
itive peak bias exhibited by resolved blobs. To prevent this 
bias correction from being applied to noise-affected unre- 
solved blobs (i.e. where the number of pixels flooded is arti- 
ficially boosted due to a connected noise feature), BLOBCAT 
only applies the correction to those blobs with M ^ 1.1; the 
suitability of this value was determined empirically. 

Column 26: SNR.OBS 
Observed (raw) SNR, A OBS = S° BS /cr. 

Column 27: SNR.FIT 
Fitted SNR, A F1T = S* lT /a. 

Column 28: SNR 
SNR, A, corrected for peak bias (equation (1141) '). 

Column 29: S_p_0BS 
Observed (raw) peak SB, S° BS . 

Column 30: S_p_FIT 
Fitted peak SB, Sj IT , obtained using a 2D parabolic fit to 
a 3 x 3 pixel array about the raw peak pixel (x p ,y p ). If 
S£ IT < S° BS , then BLOBCAT sets S" T = S° BS so as to use 
the more accurate measurement (see Appendix A and § 12.21) . 

Column 31: S_p 
Peak SB, S p , corrected for peak bias (equation l|14|l). 

Column 32: S_p_CB 
Peak SB corrected for peak bias and clean bias, S P B . Clean 



bias is a deconvolution effect that redistributes SB from 
real blobs to noise peaks, systematically re ducing the ob- 
serve d SB of blobs independent of their SNR (jCondon et al.l 
1998). The effect is more pronounced for observations with 
poor Fourier-plane coverage. Given the degree of clean bias 
present in the SB image, AS CB Jy beam" 1 ), BLOBCAT 
makes the following correction, 



S ,, 



S p + AS C 



(28) 



Column 33: S_p_CBBWS 
Peak SB corrected for peak bias, clean bias and band- 
width smearing, S p B,BWS . Using the input value of w (equa- 
tion (O), BLOBCAT corrects for bandwidth smearing with 

~rCB 



S D 



S D 



(29) 



This is the final reported value of the blob's peak SB, to be 
used for post-processing. 

Column 34: S_p_CBBWS_err 
Error in corrected peak SB, which we define as 



[(AS ABS S P CB ' BWS ) 2 + 



(A5 PIX s: 



PIX r-CB,BWS\ 2 



>'+(i)T 



(30) 



where AS ABS is the absolute calibration error of the SB 
image and AS 11 "™ is the pixellation uncertainty (see Ap- 
pendices A and B). The suitability of this error in linear 
polarization is discussed in § 13.21 

Column 35: S_int_0BS 
Observed (raw) integrated SB, S° BS . 

Column 36: S_int_0BSCB 
Observed integrated SB corrected for clean bias, given by 



<iObs,cb _ gOBs npix AS 



(31) 



fit, 

This value may be useful for non-Gaussian blobs (see § I3.3[) . 

Column 37: S_int 
Integrated SB, S int , calculated by application of blob volume 
correction (equation l|17[)) to S^ s . 

Column 38: S_int_CB 
Integrated SB corrected for clean bias, S^ B t , calculated by 
application of blob volume correction (equation (jXTJ) ) to 
5' S ° BS ' CB . This is the final reported value of the blob's inte- 
grated SB, to be used for post-processing (though see com- 
ments in § 13. 3p . 

Column 39: S_int_CB_err 
Error in corrected integrated SB, which we define in a similar 
manner to S_p_CBBWS_err (see also § 13. 1[) as 



a s CB = V {AS^Sf^f + a 2 



(32) 



The suitability of this error in linear polarization is discussed 
in §E2 

Column 40: R_EST 
Size estimate of detected blob, 7? BST , in units of the sky 
area covered by an unresolved Gaussian blob with the same 
peak SB, taking into account local bandwidth smearing. To 
derive this estimate we first focus on an unresolved Gaussian 
blob with FWHM O, as defined by the image resolution, and 
peak SB S p , as measured from the detected blob. For this 
unresolved blob, the relationship between its full width at 
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a fraction Tf/A of its peak SB, which we denote ip, and its 
FWHM is given by 



(33) 



To calculate 7? EST we take the ratio between the measured 
area of the detected blob, H b iob, and the area of an ellipse 
with axes defined by equation (|33p . When the broadening 
effect of bandwidth smearing is included into this ratio, we 
get 



R EST — -Hbiob 



vr e„ 



aJ e min] a 

ZD Tf 



(34) 



The parameter J? EST is not intended to be used for quanti- 
tative analysis. In § [4] we discuss how R EST may be used to 
flag blobs that exhibit potentially complex (non-Gaussian) 
morphologies for follow-up. 

Column 41: VisArea 
BLOBCAT can optionally calculate the fraction of visible sky 
area, namely the fraction of non-blank pixels assuming use 
of an equal-area projection, over which a blob detected at 
position (r, s) could have been detected within the SB im- 
age. This is known as the blob's visibility area. This area 
may be used, for example, to calculate a completeness cor- 
rection when compiling source counts (e.g. Hales et al., in 
preparation). To calculate the visibility area, BLOBCAT takes 
into account spatial variations in both image sensitivity and 
bandwidth smearing. For non-blank pixels (x, y), the frac- 
tion of suitable sky area for detecting a blob with equal peak 
SB to that of a blob located at (r, s), where r £ x, s £ y, is 
obtained by counting the number of pixels that satisfy 



T d o-(x,y) < S p (r,s) 
uj{x,y) ^ uj(r,s) 



(35) 



2.7 Optional Outputs 

To aid visual inspection and post-processing of blobs, 
BLOBCAT can optionally produce two additional forms of out- 
put. The first is a modified SB FITS image in which all 
flooded pixels have been highlighted (reset to a large value; 
this value may be user-specified, s ee Appendix B) . The sec- 
ond is an image ov erlay in ds9 |jove fc Mandell l2003h or 
Karma (|Goochl .1996) formats, for use with their respective 
ds9 or kvis FITS viewers. The overlays present the identifi- 
cation number and boundaries in RA and Dec for each blob. 
To illustrate these two optional forms of output, an example 
output FITS image superposed with a kvis overlay is pre- 
sented in Fig. [5] BLOBCAT may be easily modified to produce 
overlays in other suitable formats, for example through use 
of the pywcs wrapper to WCSLIB. 



3 PERFORMANCE 

We have carried out Monte Carlo simulations to investigate 
the performance of BLOBCAT in total intensity and linear po- 
larization, as described in the following sections. 




3 n 27 m 31 s 30 s 39 s 28 s 27 s 
Right Ascension (J2000) 



Figure 5. Output FITS image and kvis overlay as produced by 
BLOBCAT, illustrating how three bl obs in the image ar e highlighted 
and identified (sample data from lNorris et aJ1l2006t) . 



3.1 Total Intensity 

3.1.1 Simulation Setup 

We tested BLOBCAT in total intensity by injecting Gaussian 
sources with peak SNRs between 5 — 100<r into images of 
Gaussian noise, inspecting the accuracy of the recovered SB 
and positional measurements. To compare BLOBCAT's flood 
fill approach with that of standard Gaussian fitting, we also 
carried out these simulations using IMFIT, a widely used 
Gaus sian source fitter from the MIRIAD package (|Sault et alj 
1995). Gaussian fitting routines such as IMFIT have bee n 
used by many surveys such as NVSS (ICondon et al.lll998T). 
Phoe nix (|Hopkins et all 120031 ). and SUMMS l|Mauch et al l 
l2003h . 

Two classes of source were tested, with the aim of 
demonstrating the virtues and limitations of BLOBCAT's mod- 
ified flood fill approach. The first were unresolved (point) 
sources, selected to demonstrate that flood fill algorithms 
need not be limited to the parameter space occupied by 
complex non-Gaussian sources. The second were highly (and 
somewhat pathologically) resolved Gaussian sources with 
FWHMs 5 times larger than the image resolution, prob- 
ing parameter space where parameterised Gaussian fitting 
methods are optimal. We did not quantitatively address per- 
formance relating to non-Gaussian sources because of the 
lack of an obvious standardised test source; qualitative dis- 
cussion of non- Gaussian blobs is presented in § 13.31 

We generated 125 independent samples per SNR bin us- 
ing noise images produced as follows. To realistically char- 
acterise the noise environment present in images of total 
intensity, we obtained Stokes V data from an individual 
pointing of t he mosaicked 1.4 G Hz aperture synthesis ob- 
servations of iNorris et al] l|2006h . We imaged this Stokes V 
data using 1" pixels, and convolved to a final circular reso- 
lution with (FWHM) 6 = 14". We found this image to be 
free of sources and artefacts. Using SExtractor (see S 12. 1 .2|) . 
we modified this Stokes V image for use as a master noise 
image by enforcing zero mean and unit variance throughout 
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sub-regions of size 150 independent resolution elements. The 
noise image for each sample was then produced by extract- 
ing a randomly positioned thumbnail image from the master 
noise image, from a pool of over 150, 000 choices. 

For each sample we measured the injected source's peak 
SB, integrated SB, and position using both BL0BCAT and 
IMFIT. We executed IMFIT using unconstrained Gaussian 
fit parameters, imitating a blind survey. For input point 
sources, we also executed IMFIT using a constrained fit, fix- 
ing the source size to the image resolution. We then com- 
pared the output values for these different methods with 
their true input values. To prevent source misidentification, 
we checked that each recovered source extended over its true 
input location. We describe the results of these Monte Carlo 
simulations for SB measurements in o| l3.1.2l and for positions 
in § 15X51 



3.1.2 Results and Discussion: Surface Brightness 
Measurements 

We performed our total intensity Monte Carlo simulations 
for a range of flooding thresholds (T/) and peak bias correc- 
tion factors (A), setting the detection threshold (Td) as small 
as possible so as to limit the induction of sampling bias in the 
lowest SNR bins. For reasons outlined in § I2.4.1l - l2~4.2l we 
found that optimal SB recovery was obtained using T; = 2.6 
and A = 3.5. 

In Fig. [6] we present the SB results of our simulations, 
where we have executed BL0BCAT with the optimal Tf and 
A values from above, we have executed IMFIT with uncon- 
strained Gaussian fit paramete rs, and where we have used 
median statistics (|Tukevl ll977T ) to robustly prevent noise 
outliers from biasing intrinsic source extractor properties. 
The results obtained from executing IMFIT with constrained 
point source fits, using the same simulation data as for the 
unconstrained fits, are presented in Fig. [7] To put BLOBCAT's 
performance in perspective, we first discuss the results from 
IMFIT, starting with the unconstrained fits from Fig. [6] 

The strength of IMFIT is its ability to perform least 
squares fitting in order to separate smooth underlying 2D 
elliptical Gaussians from superposed noise fluctuations. A 
key requirement of this process is that there are sufficient 
degrees of freedom (DOFs) to fit the position, peak SB, ma- 
jor and minor axis, and position angle parameters. Given 
that the number of DOFs is related to the number of inde- 
pendent resolution elements within the fitting region, it is to 
be expected that IMFIT will struggle to constrain multiple 
fit parameters for point-like input sources. This is reflected 
in the IMFIT results from Fig. [6] where the systematic bias 
in integrated SB measurements for point sources (top-right 
panel; >15% at 5a) demonstrates IMFIT's inability to si- 
multaneously constrain peak SB and angular dimension pa- 
rameters. For these point sources, which by definition have 
the dimensions of a single resolution element and therefore 
contain essentially one piece of information, namely their 
brightness, least squares fitting is easily coerced into includ- 
ing adjacent noise peaks into the fit. However, for resolved 
sources, which by definition extend over multiple indepen- 
dent resolution elements, least squares fitting becomes less 
likely to misinterpret noise features as true signal, and so 
becomes more accurate. 

The systematic positive bias exhibited by IMFIT in its 
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Figure 6. Performance of BLOBCAT (points) and IMFIT (shad- 
ing) in total intensity for input unresolved (top row) and re- 
solved (FWHM = 5 X image resolution; bottom row) Gaussian 
sources. Measurements of peak (left column) and integrated (right 
column) SB over a range of input peak SNRs are summarised 
by their median (points/curves), and first and third quartilcs 
(whiskers/shading). Dashed curves trace median measurements 
resulting from exclusion of the peak bias correction for resolved 
sources (equation Q14I ). Fit parameters for IMFIT are uncon- 
strained. For reference, expected random errors are indicated by 
the median absolute deviation (MAD PS 0.6745cr; dotted curves). 
Note that the y-axis range differs between rows. 



cd 






fac 


c 


1.3 


u 




3 






m 


'a 
7 


1.2 


-d 




CD 






act 


no) 


l.l 








+j 




l 




m 


H 


m 




CD 




-a 




0.9 


CD 




w 








M 




ma 


Bri 


0.8 


u 






o 






■z. 







PEAK SB 

i i i r 

BLOBCAT (peak) 
BLOBCAT (int) 
MAD 

IMFIT (tixed) 



INTEGRATED SB 





5 10 20 50 100 5 10 

Input Peak Signal-to-Noise Ratio Sp RUE, /cr 



20 50 100 
TRUE , 



Figure 7. Reproduction of top row of Fig. [75J but here displaying 
IMFIT results for point source fits with angular dimensions fixed 
to the image resolution. 



measurements of integrated SB for point sources leads to two 
systematic effects. First, given that the integrated to peak 
SB ratio is typically used to select whic h measure best char - 
acterises the flux density of a source fe.g. lHuvnh et al.ll2005r i. 
the flux densities of faint sources will be systematically over- 
estimated. Second, this ratio is of ten used to estimat e decon- 
volved angular source sizes (e.g. iHuvnh et al1l2005r ). which 
too will become positively biased for faint sources. We com- 
ment on this ratio further in § 14.11 

We now turn to IMFIT's performance from Fig. [7] When 
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there is prior knowledge that a source is unresolved, IMFIT 
can be constrained to fit a point source, fixing its fitted 
dimensions to those of the image resolution. Comparing the 
results from Fig.[6]with those of Fig. [7] we find that the point 
source assumption reduces IMFIT's integrated SB bias, but 
does not completely eliminate it. Left behind is a marginal 
positive bias at low input SNR, caused by IMFIT's residual- 
minimisation strategy to pull fitted sources towards noise 
peaks that are directly adjacent to true source peaks. We 
comment further on measured positions in § 13.1.31 

Returning to the BLOBCAT results from Fig. [6J we find 
that the recovered peak and integrated SB measurements for 
point sources are systematically unbiased. This performance 
enhancement over IMFIT is due to the reduced influence that 
nearby noise features can exert over BLOBCAT's integrated SB 
measurements. Only directly connected noise features can 
affect flood fill, when the algorithm spills into adjacent noise 
peaks and is eventually limited by Tj, whereas strong noise 
peaks separated by a noise trough from the true source may 
be least square minimised by IMFIT as statistical fluctuations 
superposed on a resolved source. 

For the resolved source investigated, IMFIT clearly out- 
performs BLOBCAT in avoiding integrated SB systematics. 
However, BLOBCAT's systematic underestimate is no worse 
than ~ 5%, even for sources with peak SNR = 5. As indi- 
cated in Fig. [6] this underestimate would be more severe if 
the peak bias correction from equation (|14[1 were neglected; 
failure to debias the peak SB causes equation (1171) to under- 
estimate the integrated SB. We attribute BLOBCAT's difficulty 
in collecting the full integrated SB for resolved sources to an 
analogous 'negative' version of our peak SB correction. As 
sources become more resolved, it becomes more likely that 
negative noise features may limit the spatial extent avail- 
able for the flood fill algorithm to explore. This behaviour is 
not completely offset by positive noise features contributing 
to the spatial extent of sources, and so a bias is produced. 
Given how mild the resulting bias is, even for the patholog- 
ically resolved source tested, we do not attempt to correct 
for it within BLOBCAT. 

To estimate the uncertainty in BLOBCAT's measurements 
of peak and integrated SB, we use equations (|30[) and (I32|) . 
These errors are indicated by dotted lines in Fig. [6] we ne- 
glect the absolute calibration error (AS ABS ), and set the 
pixellation error (A5" PIX ) to 0.5% (cf. Appendix A). We do 
not reduce the factor of a in equation (|32[) by, for example, 
the square root of the number of independent resolution el- 
ements within the spatial extent of the source, as might be 
appropriate for methods that produce systematically unbi- 
ased integrated SB measurements. Instead, we define equa- 
tion (|32[) in a similar manner to equation (I30|l . so as to ar- 
tificially account for BLOBCAT's systematic underestimate of 
integrated SB for resolved sources. In this way, the error esti- 
mates produced by BLOBCAT realistically encapsulate its true 
performance. Note that in practice, resolved sources will al- 
most always be less resolved than for our simulated resolved 
source here. This implies that our catalogue error estimates 
are unlikely to underestimate true SB measurement errors. 

3.1.3 Results and Discussion: Position Measurements 

BLOBCAT catalogues three positions for each detected blob: 
the raw peak pixel, an area centroid using equation (|26[) , 




Figure 8. Accuracy of positions measured by IMFIT (shading; 
left column) and BLOBCAT (points for the peak pixel, centroid, 
and SNR-wcighted centroid; right column) in total intensity for 
input unresolved (top row) and resolved (bottom row) Gaus- 
sian sources; median statistics are displayed (similar formalism 
to Fig. [6}- The dashed curve (top- left panel) traces median mea- 
surements for constrained IMFIT point source fits with angular di- 
mensions fixed to the image resolution. For reference, the dotted 
and dot-dashed curves (identical in each panel) indicate expected 
median positional offsets using equations d 36 1) and H37I I. respec- 
tively. The left y-axis for each panel denotes position offset from 
the true input source position in units of the circular resolution 
FWHM (0 = 14"); the right y-axis denotes this offset in units of 
pixel width (1"). Note that the y-axis range differs between rows. 
For clarity, the bottom-right panel shows only centroid and SNR- 
weighted centroid measurements; the inset provides peak pixel 
measurements in a zoomed-out view of this panel. 



and a SNR- weighted centroid using equation (J2TJ) . In Fig. [8] 
we compare the accuracy of these measurements, as well 
position measurements from IMFIT, in recovering the true 
input positions for our simulated unresolved and resolved 
sources. 

Fig. [5] indicates that of BLOBCAT's three position mea- 
surements, the weighted centroid is optimal for both unre- 
solved and resolved Gaussian sources. The superior perfor- 
mance of the peak pixel position for unresolved sources is 
an artefact of injecting sources centred on a pixel; in general 
the performance of this position measure will be poorer. For 
resolved sources, the raw peak position is easily corrupted 
by the peak bias effect described earlier in § 12.4.11 For both 
unresolved and resolved Gaussian sources, the area centroid 
exhibits limited accuracy due to its lack of pixel weighting. 

For faint unresolved sources, BLOBCAT's positions are 
more accurate than those of IMFIT's unconstrained Gaussian 
fits; IMFIT is limited in its accuracy due to its optimisation 
attempts to accommodate adjacent noise features through 
least squares minimisation. For the pathologically resolved 
source simulated, IMFIT's position measurements are more 
accurate than BLOBCAT's. 

To estimate the uncertainty in BLOBCAT's weighted cen- 
troid positions, we first look to an uncertainty estimate for 
IMFIT. For plotting purposes, the median positional offset 
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exhibited by IMFIT can be estimated as the median of the 
quadrature sum of two zero-mean signals representing RA 
and Dec measurements with error a a (equation (I18[) ) and as 
(equation ([22])), respectively. By using a factor of V8 In 2 « 2 
instead of 1.4 in eq uations (1211) and (124[) as suggested for 
Gaussian fitting by ICondonl (1 19971 ) , neglecting calibration 
and frame errors, using G = @ a = Qs for a circular beam, 
and noting that the median off set about an inp ut position in 
2D is given by the median of a lRavleighl (| 18801 ) distribution, 
we evaluate the expected median positional offset for IMFIT 
as 



pos. offset n 



(36) 



This estimate is indicated by the dotted curve in each panel 
of Fig. m 

Equation ()36|) suitably encapsulates the positional un- 
certainties exhibited by both IMFIT and BLOBCAT for unre- 
solved sources. However, for our heavily resolved source, it 
systematically underestimates the positional uncertainties 
exhibited by both the Gaussian fit and flood fill approaches. 
To avoid complexity we do not attempt to explicitly param- 
eterise the increased positional uncertainty displayed for re- 
solved sources. Instead, we have chosen to simpl y modify 
the p ositional uncertainty equations presented by ICondonl 
( 1997) to use a factor of 1.4 (instead of ~ 2), as presented in 
equations (|21|) and (|24p . These modified equations lead to a 
more appropriate estimate for the median positional offset, 



rr .BLOBCAT 

pos. offset mcdian 



e 

1AA 



(37) 



as indicated by the dot-dashed curve in each panel of Fig. [8] 
The factor of 1.4 was selected empirically to ensure that 
for Gaussian sources with sizes ranging from unresolved to 
the heavily resolved source tested, positional uncertainties 
may be systematically estimated to within ~ 5% of a beam 
FWHM. We note that the factor of 1.4 is also suitable for 
use with IMFIT (see left panels in Fig. [8} . 

3.2 Linear Polarization 

3.2.1 Simulation Setup 

We tested BLOBCAT in linear polarization, L RM , in a similar 
manner to that described in § 13.1.11 for total intensity. We 
tested the same two classes of source, sampling input peak 
SNRs between 6 — 100o- RM (cf. equation (J5)); also § I2.4.2[) . 
For comparison, we also tested the performance of IMFIT 
using both constrained and unconstrained Gaussian fit pa- 
rameters. 

We generated each of the 125 sample images per SNR 
bin as follows. We assumed an illustrative observational 
band centred on 1396 MHz with width 200 MHz, split into 
25 x 8 MHz channels. For each frequency channel we ob- 
tained two independent noise thumbnails from the master 
noise image (cf. § I3.1.1[) . which we used to represent Stokes 
Q and U noise. A point (or resolved) source with a RM of 
— 100 rad m -2 , unresolved in Faraday space, was then suit- 
ably injected into each of the Stokes Q and U images across 
the band. We define the peak SNR of these injected sources 
as the ratio between their true input peak polarized SB and 
o~ rm ■ Using RM synthesis dBrentiens fc de Bruvnll2005l ) and 
RMCLEAN (|Heald et al.l 120091 ). images of L» M were then 
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Figure 9. Surface brightness measurement performance of 
BLOBCAT in linear polarization, L aM ; the display layout is du- 
plicated from Fig. [6] Fit parameters for IMFIT are unconstrained. 
No corrections for polarization bias have been applied. 



produced in accordance with equation ([T]). For each sample 
we then recovered the peak and integrated SB using both 
BLOBCAT and IMFIT. We describe the results of these Monte 
Carlo simulations for SB measurements in § 13.2.21 and for 
positions in § 13.2.31 



3.2.2 Results and Discussion: Surface Brightness 
Measurements 

We performed our linear polarization Monte Carlo simula- 
tions using a range of Td, Tf, and A parameter values, find- 
ing that the optimal total intensity value of A = 3.5 was 
suitable for use in polarization as well. This behaviour of 
A can be understood by comparing profiles through sources 
embedde d within images of total intensity and i/RM , as pre- 
sented bv lHales et alj (|2012T ). They show that above Tf = 4, 
Gaussian sources embedded within these two environments 
are very similar in morphology, modulo statistical fluctua- 
tions. For this reason, the relevant cross-sectional area for 
the peak bias correction, H\ in equation (|13[) . may be ob- 
tained for images of L BM using the same value of A as was 
recommended for total intensity. Using this value, we found 
that integrated SB recovery was optimised when flooding 
down to Tf = 4.0, as discussed earlier in § 12.4.21 

In Fig. |9]we present the results of our simulations, where 
we have executed IMFIT using unconstrained Gaussian fit 
parameters with a 4<7 RM cut-off fitting threshold (same as 
Tf). The results obtained from the same simulations by exe- 
cuting IMFIT with constrained point source fits are presented 
in Fig. ITDl 

The strong systematic biases exhibited by IMFIT in 
Fig. [5] suggest that its unconstrained fits are unsuited to 
the statistical environment of L RM . We attribute this to a 
breakdown in the assumption that sources are superposed 
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Figure 10. Reproduction of top row of Fig. [9] but here displaying 
IMFIT results for point source fits with angular dimensions fixed 
to the image resolution. 



with Gaussian noise fluctuations, as required to perform ro- 
bust least squares minimisation. When IMFIT's angular size 
parameters are fixed to the image resolution, the systematic 
biases in measured SB for input point sources are dimin- 
ished, as shown in Fig. 1101 Through further experimenta- 
tion, we found that systematic IMFIT biases were unavoid- 
able for all but the most manual, uniquely-constrained fits. 
Reduction or removal of the 4<7r M cut-off threshold, used 
to prevent faint pixels from entering the Gaussian fitting 
process, was found to worsen systematic trends. We found 
similar biases to those described above when using IMFIT in 
images of standard linear polarization, L. 

In contrast, the results from Fig. [5] indicate that 
BLOBCAT's measurements of peak and integrated SB are, in 
effect, systematically unbiased. We justify this claim as fol- 
lows, beginning with peak SB performance. 

The small systematic positive bias exhibited by the re- 
covered peak SB is due to the positive semi-definite nature 
of L RM 3* 0; this effect, which is extrinsic to BLOBCAT, is 
known as polarization bias. Because of the intimate relation- 
ship that exists between polarization bias and the specifics 
of observational setup, as elucidated shortly, BLOBCAT makes 
no attempt to correct for this bias. To illustrate the va- 
riety and complexity of schemes that may be applicable 
to d ifferent data, we note that corrections designed for L 
(see iLeahv fc Ferninil Il989h are not suitable for Lrm be- 
cause they are govern ed by different statistical distribu- 
tions (jHales et al.ll2012r ). Furthermore, no fixed (unparame- 
terised) correction schem^ll is suitable for L Rm , because the 
statistical properties of L RM are dependent on the under- 
lying observational characteristics o f the data such as fre- 
quency coverage and channel width l|Hales et alJl20l3 l. In- 
stead, more computationally expensive schemes to correct 
for polarization bias, and potentially Eddington bias (which 
affect s the measured SB of unresolved sources; [Eddingtonl 
Il913h . may be required (Hales et al., in preparation). To al- 
leviate polarization bias in BLOBCAT's measurements of peak 
SB, users must independently apply their own suitably se- 
lected correction scheme. 



7 We note that George et al . (2011) recently proposed a fixed 
correction scheme for L R m- As their scheme implicitly assumes 
a specific observational setup, its applicable parameter space is 
limited. 



BLOBCAT appears to accurately recover measurements of 
integrated SB for unresolved sources, apart from a positive 
bias exhibited at low input SNR. This latter behaviour is 
due to polarization bias, which affects sources whose pixel 
magnitudes are predominantly at low SNR. However, this 
bias is not of significant consequence because, on average 
for these sources, their ratios of integrated to peak SB will 
not deviate significantly from 1. In these cases, their peak 
values will best represent their flux densities (cf. § 13.1.21 also 
§ 14.11) . which need only be corrected for polarization bias in 
order to deliver systematically unbiased measurements. 

Turning to BLOBCAT's measurements of integrated SB 
for highly resolved sources, their unbiased nature appears 
to be due to the fortuitous cancellation of two systematic 
effects. The first of these is the negative bias for resolved 
sources, as seen earlier for total intensity (lower-right panel 
of Fig. [6]). The second is the positive polarization bias dis- 
cussed above. We conjecture that the cancellation of these 
two effects is robust, regardless of the observational setup 
dictating the specific statistical description displayed by the 
input L RM (or L) image. Our justification for this assertion 
is that the dominant statistical differences between images 
of I/rm for different observational setups, or bet ween images 
of L R M and L, occur below a threshold of 4<t R m |Hales et al.l 
2012). Given that BLOBCAT ignores data below this cut-off 
threshold (for our recommended 2/ = 4.0), we are confi- 
dent that any systematic blob-extraction differences between 
these images will be below the noise level. 

Regarding SB measurement uncertainties, we mirror 
the earlier discussion of total intensity uncertainties from 
§ 13.1.21 We note that equations ()30|) and ()32|) suitably reflect 
BLOBCAT's measurement errors in linear polarization, as ex- 
hibited by the dotted lines in Fig. [9] We therefore leave these 
equations unchanged for use in linear polarization analysis. 



3.2.3 Results and Discussion: Position Measurements 

In Fig. [11] we compare the accuracy of position measure- 
ments using both BLOBCAT and IMFIT in recovering the true 
input positions for our simulated unresolved and resolved 
sources. As with SB measurements (§ I3.2.2|) . we find that 
unconstrained Gaussian fitting is not appropriate for de- 
termining source positions in linear polarization. Following 
from the discussion for positional measurements in total in- 
tensity (§ I3.1.3p . we note that BLOBCAT's weighted centroid 
positions are also suitable for use in linear polarization, as 
are the uncertainty estimates using equations (|21[) and (|24l) . 



3.3 Complex Blobs 

In this section we qualitatively discuss BLOBCAT's perfor- 
mance when analysing blobs that exhibit complex (resolved, 
non-Gaussian) morphology. We do not seek to quantitatively 
address this performance due to the lack of clear standard- 
ised test sources. Possible examples of complex blobs include 
supernova remnant shells, extended lobes of radio galax- 
ies, radio relics, and extended Galactic emission; we discuss 
how these blobs may be automatically identified and flagged 
for follow-up using BLOBCAT in § 0] Other examples include 
blended blobs that consist of multiple overlapped individual 
Gaussians; we discuss these in § 14.21 
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Figure 11. Positional accuracy of BLOBCAT and IMFIT in linear 
polarization, L R m', the display layout is duplicated from Fig. [8] 



For each detected blob, BLOBCAT assumes 2D elliptical 
Gaussian morphology (§ I2.3p so as to infer a debiased peak 
SB and a corrected integrated SB (§ 12. 4[) . If a detected blob is 
not of true Gaussian morphology, then its debiased peak SB 
is unlikely to be significantly affected. This is because use 
of A = 3.5 in calculating the relevant cross-sectional area 
susceptible to peak bias (using equation (|14| l1 is still likely 
to be a suitable choice for non-Gaussian blobs. It is more 
difficult to generalise the systematic manner in which mea- 
surements of corrected integrated SB may differ from their 
true values. The simplest observation is that low SNR blobs 
are more vulnerable than high SNR blobs to systematic error 
in their measurements of corrected integrated SB (cf. equa- 
tion l|17p). However, the fraction of blob volume remaining 
unflooded below Tf will be small for a low SNR blob that is 
highly-resolved, suggesting that in general, uncorrected in- 
tegrated SB measurements will be more accurate than cor- 
rected integrated SB measurements in estimating flux den- 
sities for a majority of complex blobs. We have verified the 
general statements above by testing BLOBCAT's performance 
in handling sources with a range of complex morphologies. 
We find that BLOBCAT's performance for slightly extended 
non-Gaussian blobs that consist of blended Gaussian com- 
ponents, where the approximation of 2D elliptical Gaussian 
morphology is poor, is in general poorer than the simulation 
results presented earlier for pathologically resolved Gaus- 
sian blobs. However, alternatives for handling such blobs 
more suitably in post-processing are available, as discussed 
in § 14.21 For highly extended non-Gaussian blobs, BLOBCAT's 
measurements of uncorrected integrated SB are in general 
quite accurate because the fraction of unflooded blob volume 
is always very small. 

In Fig. [12] we present two sample non-Gaussian blobs 
in an attempt to illustrate their potential for integrated SB 
error. Users should judge for themselves whether corrected 
(<Sint) or uncorrected (S^ s ) measurements of integrated SB 
best describe the flux densities of their complex blobs; to 
assist with this decision, BLOBCAT reports both values in its 
output catalogue. If the two values differ by more than a few 
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Figure 12. When confronted with a non-Gaussian blob (two 
arbitrary resolved samples illustrated; solid curves), BLOBCAT as- 
sumes an idealised Gaussian morphology (dashed curves at equal 
peak SNR, A) so as to infer the fractional volume remaining un- 
flooded below the cutoff threshold (Tf). If this assumption is par- 
ticularly poor, as suggested by the example in the lower panel, 
then the resulting measurement of volume-corrected integrated 
SB (using equation H17I I) may become systematically biased away 
from the blob's true flux density. For such blobs, the uncorrected 
measurement of integrated SB is likely to act as a less-biased 
estimator of true flux density. 



percent, then the corrected values may be unsuitable, and 
manual inspection is recommended. 

Similarly, users should determine which BLOBCAT posi- 
tion measurement is most appropriate for each of their com- 
plex blobs; the SNR-weighted centroid may be inappropriate 
for some blobs. For example, the weighted centroid position 
for an arc-shaped radio relic (i.e. a crescent moon shape) 
may be situated beyond the boundaries of its flooded pixels; 
the raw peak pixel or area (unweighted) centroid position 
may be more appropriate. To aid users, BLOBCAT catalogues 
all three position measurements. In addition, flags are pro- 
duced (see § 12.6ft so as to indicate whether the centroid po- 
sitions are situated within or exterior to the flooded pixel 
confines of each blob. 



4 POST-PROCESSING 

BLOBCAT is designed to produce an output catalogue that 
details basic properties of blobs in an image. Depending on 
the nature of the data and the requirements of the user, 
additional processing may be required to make full use of 
the catalogue. 

In this section we highlight two such examples of post- 
processing. We first consider a selection procedure for de- 
termining which SB measurement (peak or integrated) best 
describes the flux density of a blob. We then consider a pro- 
cedure for identifying and analysing blobs that exhibit non- 
Gaussian morphologies. 
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4.1 Blob Flux Densities 

The choice of whether to represent a blob's flux density by 
its measured peak or integrated SB is equivalent to asking 
whether the blob is unresolved or not. If it is unresolved then 
the peak SB should be used (explained as follows; note also 
Appendix A), while for resolved blobs it is the integrated 
SB that should be used. 

The user is responsible for selecting which of the mea- 
surements of peak or integrated SB best represent the true 
flux density for each detected blob. We do not automate 
this process for the same reason that Gaussian fitting tasks 
such as IMFIT do not, namely that noise features adjacent 
to faint, unresolved sources may render integrated SB mea- 
surements less likely to represent true flux densities than 
peak SB measurements. 

If a user is only interested in a small number of blobs, 
then as with IMFIT, more attention can be paid to each indi- 
vidual fit so as to minimise potential fitting errors, for exam- 
ple through fitting constraints in IMFIT or perhaps suitable 
pixel masking prior to running BLOBCAT. For such carefully 
fitted blobs, their integrated SB measurements may be used 
to represent their true flux densities, even if they are faint or 
unresolved. However, for large sample sizes (e.g. for a sur- 
vey), it is impractical to consider implementation of such 
manual, or perhaps even machine-learning enabled, fitting 
procedures. Indeed, attempting to manually fit each source 
in a survey may inadvertently bias the resulting flux density 
measurements due to subjectivity on behalf of the user. 

Instead, a more appropriate strategy may be initiated 
by taking the ratio between integrated to peak SB measure- 
ments for each blob, so as to characterise the global variance 
in this ratio as a function of measured SNR. By considering 
the parameter space populated by noise-affected blobs with 
S int < S p , an envelope can be designed as a function of SNR 
to select which of the blobs with S int > S p are likely to be 
similarly affected by noise. Only those blobs with ratios in 
excess of the envelope criterion may be deemed resolved, and 
in turn have their flux densities represented by their inte- 
grated SB measurements. All other blobs should have their 
flux densities represented by their peak SB measurements. 
This strategy has been employed for IM FIT-based surveys of 
total intensity, e.g. iHuvnh et al] (|2005l ); application to total 
intensity and linear polarization surveys with BLOBCAT will 
be presented by Hales et al. (in preparation). 

If a blob is resolved, then an estimate of its deconvolved 
size may be obtained directly from its integrated to peak SB 
ratio (via division of equation JSJ by equation @), namely 

"sT _ e~ e- ' (38) 

where the deconvolved angular size can be estimated us- 
ing the geometric mean as Vdeconv ~ sj Wr 4>s — Omaj 6 min . 
Again, illustrations of thi s procedure are av ailable in to- 
tal intensity using IMFIT (|Huvnh et all 120051 ). and will be 
presented for total intensity and linear polarization with 
BLOBCAT by Hales et al. (in preparation). 



(i.e. not confusion limited) with Gaussian sources. However, 
if complex blobs are present (cf. § I3.3J1 this assumption may 
not always be suitable, requiring additional processing of the 
complex objects so as to suitably characterise their proper- 
ties. Before commenting on this processing, we briefly out- 
line a simple procedure by which complex blobs may be first 
identified. 

In equation (|34[) we defined the parameter R EST , which 
estimates the size of a detected blob in units of the sky area 
covered by an unresolved Gaussian blob with the same peak 
SB. If J? BST is large, it indicates that a blob is unlikely to 
be unresolved. 

To illustrate how this parameter may be used to iden- 
tify potentially complex blobs for follow-up, we preview the 
general processing steps performed by Hales et al. (in prepa- 
ration) to catalogue sources in radio-wavelength images of 
total intensity and linear polarization; details of these im- 
ages are not pertinent to the discussion here, apart from not- 
ing that they consist mostly of compact sources (i.e. there 
are no wide-spread extended image features) . Hales et al. (in 
preparation) find that a value of i? BST > 1.4 is well-suited for 
automatically flagging complex blobs. Gaussian fits are at- 
tempted for each of these flagged complex blobs with IMFIT 
to determine which ones are likely to consist of single or 
multiple overlapped (blended) Gaussians. This procedure is 
semi-automated to require only two initial manual inputs 
to IMFIT: the number of potentially overlapped Gaussians, 
and their cursory positions. We note here that standard 
digital imaging tech niques such as the Laplacian of Gaus- 
sian operation (e.g. So nka. Hlavac fc Bovkj 120081) which is 
imple mented within the AEGEAN algorithm ( Hancock et al.l 
20121). blob decompo sition algorithms such as CLUMPFIND 
( Willi ams et al.l 1994 ), or the wi dely used Watershed trans- 
form dRoerdink fc M ciistcr 2000l). may be well suited to per- 
forming this step automatically. Hales et al. (in preparation) 
preserve the original BLOBCAT measurements for those blobs 
that are best fit by a single Gaussian. For each blob identified 
as being blended, they replace its original BLOBCAT catalogue 
entry with multiple IMFIT entries for each individual Gaus- 
sian component identified. Remaining from this procedure 
are a small number of extended, non-Gausssian blobs that 
cannot be adequately refit using IMFIT (as identified due to 
their large fitting residuals; we note here that image artefacts 
may also be included in this list, though too many artefacts 
could indicate undervaluation of rms noise estimates). For 
each of these remaining blobs, Hales et al. (in preparation) 
preserve the original BLOBCAT measurements and perform a 
final manual inspection to determine which of the integrated 
SB measurements should be used to represent the blob's flux 
density (uncorrected or corrected; § 13. 3[) . 

We envisage that the above procedure may be quickly 
and easily replicated for future surveys. By performing 
Gaussian fitting for only those blobs that BLOBCAT indicates 
may be complex, it should be possible to robustly and auto- 
matically catalogue all but the most non-Gaussian of sources 
in an image. 



4.2 Blob Decomposition 

BLOBCAT assumes that isolated blobs are of Gaussian mor- 
phology in order to catalogue their properties. This assump- 
tion will work well for images that are sparsely populated 



5 SUMMARY AND CONCLUSIONS 

We have described BLOBCAT, an algorithm designed to iden- 
tify and catalogue blobs in a 2D FITS image of Stokes / 
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intensity or linear polarization (L or L RM )- Utilising a Gaus- 
sian morphology assumption and two key bias corrections, 
BLOBCAT equips its core flood fill algorithm with the tools 
necessary to perform robust SB measurement. 

Written in Python, BLOBCAT is easy to use and easy 
to modify. It is well-suited to analysis of large blind sur- 
veys, requiring little manual intervention for images sparsely 
populated with unresolved and resolved Gaussian sources, 
and having the ability to account for spatial variations in 
both image sensitivity and bandwidth smearing. To indi- 
cate BLOBCAT's ability to swiftly analyse data, we note that 
Hales et al. (in preparation) produce a catalogue of ~ 1000 
blobs from an image with ~ 10 000 x 10 000 pixels, including 
the use of equal-sized rms and bandwidth smearing images, 
in less than 60 seconds on a standard desktop computer. 
While source extractors built around Gaussian fitting rou- 
tines are competitive with BLOBCAT in this raw computing 
time, though such comparison is implementation-dependent, 
subsequent overheads associated with manual source inspec- 
tion may be greatly minimised when using the latter. This 
is because unresolved and resolved Gaussian blobs are au- 
tomatically and accurately processed by BLOBCAT, requiring 
only non-Gaussian blobs to be manually addressed. 

Accurate estimates of background rms noise are re- 
quired to ensure robust and accurate operation of BLOBCAT. 
We described a simple, objective, and automated procedure 
by which these estimates may be obtained, which makes use 
of local background mesh calculations. We note that this 
procedure may be used to estimate background rms noise 
for use with any source extractor, not just BLOBCAT. 

We have demonstrated the performance of BLOBCAT 
through Monte Carlo simulations of unresolved and re- 
solved Gaussian sources. We benchmarked this performance 
against that of standard Gaussian fitting, finding compara- 
ble results in total intensity, and vastly superior results in 
linear polarization. Our simulations indicate that Gaussian 
fitting is inappropriate for use in linear polarization for all 
but the most manually-constrained of fits. BLOBCAT contains 
at present the only algorithm capable of robustly catalogu- 
ing accurate flux densities for resolved or extended sources in 
linear polarization, without incurring significant systematic 
biases. 

In closing, we note that BLOBCAT may be suitable 
for cautious application to image data at non-radio wave- 
lengths, such as optical, provided that the flooding SNR cut- 
off is set to a value high enough to avoid measurement sys- 
tematics induced by low-SNR statistics. Optical pixel shot 
noise (the Poisson regime) is non-Gaussian at low-SNR and 
limits to Gaussianity at higher SNR, much like the statis- 
tics of linear polarization that can be accommodated by 
BLOBCAT. Modification of BLOBCAT's algorithms may be re- 
quired to account for wavelength- and instrument-specific 
descriptions of point spread functions and pixellation errors. 

The BLOBCAT program, supplemented with test data 
to illustrate its use, is available electronically through the 
World Wide Web at: http://blobcat.sourceforge.net/ . 
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APPENDIX A: PIXELLATION ERROR 

In radio synthesis imaging, the number of pixels per resolu- 
tion element (synthesised beam) can be adjusted after the 
original observations have been made. This is because raw 
data are obtained in the Fourier plane, enabling post facto 
over-sampling of data in the image plane. By comparison, 
optical observations are often under-sampled in the image 
plane, requiring ingenious methods to utilise their full reso - 
lution (e.g. the Drizzle algorithm bv lFruchter fe H ook 2002). 

In this Appendix we present implications for SB mea- 
surements when sampling a radio image with insufficient pix- 
els. We use the term 'pixellation error' to refer specifically 
to the systematic undervaluation of peak SB measurements 
due to imaging and fitting effects. We focus on the pixella- 
tion error exhibited by two methods of peak SB measure- 
ment for unresolved sources. We first derive a relationship 
for the pixellation error exhibited by measurements of ob- 
served (raw) peak SB. We then compare this peak pixel error 
to that exhibited by the fitted peak of a 2D parabola, where 
the fit is obtained using a 3 x 3 pixel array about the raw 
peak pixel (e.g. as implemented in the MIRIAD task MAXFIT) . 
We conclude by commenting on the manner in which image 
pixellation affects measurements of integrated SB. 

In conventional radio synthesis imaging, the sky is as- 
sumed to be represented by delta functions; each image pixel 
is thus a spot sample, as opposed to other sky representa- 
tions such as piecewise-constant pixels, which require inte- 
grals over regions to be computed. To represent the visibil- 
ity data, sources in images deconvolved using the iterative 
CLEAN algorithm will be of the form l|Briggs fc Cornwelll 
fl99l: lBri gg d H995l) 

S OB3 (x, y) = [BF * SRC * BEAM] (x, y) , (Al) 

where S OBS (x,y) is the observed source SB distribution at 
pixel coordinate (x,y), the asterisks indicate convolution, 
BF is a basis function that depends on whether the source 
is centred directly on a pixel or not, SRC represents the 
clean component model of the source, and BEAM is the 
restoring beam. We assume that BEAM is Gaussian. 

We define e OBS = S , ° BS /S™ UE as the fraction of true 
peak SB observed within the peak pixel of an unresolved 
source. We assume N a and Ns pixels per projected resolu- 
tion element such that a pixel dimension is O a /N a x /Ns ; 
here, Q a and <ds are the major and minor FWHMs that 
characterise the image resolution (see introductory remark 
in § 12. 3[) . as projected along the RA and Dec axes of an 
image (see equations (|20|l and JUJ). 

When the true peak for an unresolved 2D elliptical 
Gaussian is centred directly on a pixel, which we denote 
the 'on-pixeF case, both the BF and SRC terms in equa- 
tion (|A1|) are given by delta functions. The source SB dis- 
tribution is therefore described by an unattenuated 2D el- 
liptical Gaussian with s^SL^ = 1, regardless of the values 
of N a and N 6 . 

When the true peak is centred half-way between pixel 
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Off-Pixel 




Pixels per Circular Resolution Element FWHM 

Figure Al. Peak surface brightness underestimation due to 
pixcllation; we term this pixellation error. Shown in the upper 
panel are two unresolved ID Gaussians with true peak brightness 
S™ UE , sampled with 5 (left) and 4 (right) pixels per FWHM. 
Their true peaks are centred directly on (left) or half-way be- 
tween (right) pixels. The observed central peak pixel(s) underes- 
timates the true peak brightness of an unresolved 2D elliptical 
Gaussian by e OBS . as illustrated in the lower panel for the best- 
case (true peak centred on a 2D pixel; solid curve), worst-case 
(true peak placed at the intersection of 4 pixels; dashed curve), 
and intermediate-case (right-slant shaded) pixellation of a circu- 
lar resolution element (i.e. assuming N a = Ng). Similarly, the 
underestimate exhibited by the fitted peak of a 2D parabola, 
e FIT , is illustrated in the lower panel for the best-case (dot- 
dashed curve), worst-case (dotted curve), and intermediate-case 
(left-slant shaded) pixellation scenarios. 



centres, which we denote the 'off-pixel' case, SRC is again 
a delta function (representing a point source) and BEAM 
is a Gaussian, but now BF must consist of a sine function 
in order to represent the visibility data for a shifted delta 
function. We find that E™j cntlc is therefore given by 



OBS 
^off-ccntn 



. (ttI) sin (7rm) 



exp 



(.2/1/2 



(In [2] 



dl dm . 



(ax/2 - iy 



+ 



(A2) 



evaluated at Xx/2 = Vi/2 = 0.5. 

In Fig. IA1I we display e OBS for the on- and off-pixel 
cases from above; to conform with visual expectations, in 
the upper panel we plot ID source profiles and their corre- 
sponding ID pixel values by using a simplified ID version 
of equation (|A2|) (for which only one integral is required). 
When the underlying true peak for an unresolved source is 
centred (in 2D) part-way between the on- and off-pixel cases, 
e OBS is given by a value between these two solutions, as illus- 
trated by the shading in the lower panel of Fig. IA1I We note 
that the effect of the sine function in our off-pixel analysis is 



essentially negligible, only affecting the plotted curves closer 
to ~ 1 pixel per FWHM. Nevertheless, we have included the 
calculation for completeness. 

In principle, the pixellation error exhibited by measure- 
ments of observed peak SB (e OBS ) may be minimised by 
imaging with a large number of pixels per resolution ele- 
ment. However, in practice, limited computing resources will 
often prevent the production or subsequent analysis of such 
heavily sampled images. Rather than increasing the image 
sampling N a and Ng , the accuracy of peak SB measurements 
may be increased by performing a fit to the peak value using 
a 2D parabola; we denote these fitted peak measurements 
Sp 1T . To demonstrate this increased accuracy, in Fig. lAll we 
illustrate the pixellation error exhibited by 2D parabolic fit- 
ting, which we define as £ FIT = S* 1T / S™ VE . We note that 
our e^^pijtei an d e off T P ixci curves in Fig. lAll were obtained an- 
alytically; for brevity we will not reproduce the straightfor- 
ward derivation of Sp IT here. This derivation involves eval- 
uating raw pixel intensities from either spot samples from 
a 2D Gaussian for the on-pixel case, or evaluating equa- 
tion (1 A2f) at different pixel positions for the off-pixel case, 
then performing least squares to solve for an overdetermined 
system of linear equations (6 unknown fit parameters and 9 
constraining pixels). 



Both S„ 



and Sp 



exhibit pixellation error; the latter 



measure of peak SB is more accurate. To limit pixellation er- 
ror to within 1% using S° BS , at least 12 pixels per FWHM 
are required; for Sp 1T , this number falls to around 5. We 
suggest that observers estimate the degree to which their 
peak SB measurements may be in error due to pixellation, 
and incorporate this into their error budgets. In BLOBCAT, 
which catalogues fitted peak SB values (Sp 1T ), this is im- 
plemented using a pixellation error parameter which we de- 
fine as AS 113 ™ = (1 — Eoff-cctrc ) i this parameter is applied in 
equation (130 ^ . We note that inclusion of this parameter will 
tend to (slightly) over-estimate peak SB errors for resolved 
sources; we see this as more appropriate than underestimat- 
ing peak SB errors for point sources because this error is 
unlikely to be relevant for resolved sources (where the inte- 
grated SB represents the flux density; see § 14.11) . 

Finally, we note that integrated SB measurements are 
less affected by pixellation error than peak pixels. This is be- 
cause integrated SB is conserved when summing over multi- 
ple pixels. This conservation is limited only by noise fluctu- 
ations and the ratio between the peak SNR of a source and 
the flood fill cutoff. To illustrate this limitation, consider a 
faint unresolved source situated in a heavily pixellated image 
(i.e. where N a and Ns are small). The profile of this source 
will be poorly mapped by the pixels, rendering BLOBCAT's in- 
tegrated SB measurement (via equation (|16[) ) vulnerable to 
negative bias. However, in general this vulnerability will not 
be an issue because it is the peak SB that is the important 
value for unresolved sources (see § 14. ip . 



APPENDIX B: BLOBCAT INPUTS 

For completeness, a full list of program input arguments to 
BLOBCAT is presented below. Note that not all arguments 
may be required for analysis (see § l2.5H2.6l see also the 
default values provided in the code). For example, if errors 
are not required (or are not suitably defined for a particu- 
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lar observational scenario), the input arguments relating to 
errors below may be ignored. (Conversely, new input argu- 
ments may be easily denned by the user and incorporated 
into BLOBCAT.) 

Argument 1: SB_image . f its 
FITS image of surface brightness in Stokes / intensity (or 
Stokes Q, U, or V intensities under limited conditions) or 
linear polarization (L or L RM ); see 5 I3XT1 

Argument 2: rmsval 
Uniform (spatially-invariant) background rms noise level 
within SB image. This is required if Argument 3 is not pro- 
vided. 

Argument 3: rmsmap 
FITS image of background rms noise; see § 12.1.21 

Argument 4- bwsval 
Uniform (spatially-invariant) level of bandwidth smearing 
present in the SB image. This is required if Argument 5 
is not provided. To ignore bandwidth smearing, this value 
should be set to 1. 

Argument 5: bwsmap 
FITS image of background rms noise; see § 12.1.31 

Arguments 6-8: bmaj , bmin, bpa 
Image resolution (beam) parameters; these are only required 
if image header items are incorrect or incomplete (at present, 
beam parameters are not standard FITS headers). 

Arguments 9-10: dSNR, f SNR 
SNR thresholds for blob detection (Td) and flooding cutoff 
(Tf);see§[U 

Argument 11: pmep 
Maximum estimated peak SB attenuation due to pixellation 
error (see Appendix A) ; defined here as the maximum antic- 
ipated value of (1 — Eofflcentre)' When set to a value greater 
than 0, this parameter will ensure that sources with raw ob- 
served peak SB less than the nominated detection threshold 
(Sp BS < T d ), yet fitted peak SB greater than this threshold 
(Sp 1T ^ Td), will be accepted into the catalogue. If ignored, 
pmep will default to 1, causing BLOBCAT to check all blobs 
with S° BS ^ Tf for catalogue acceptance (though this will 
increase BLOBCAT's run-time, particularly if T d and Tf differ 
greatly in magnitude). 

Arguments 12-13: cpeRA, cpeDec 
Phase calibrator positional error in RA (<T Q , ca i) and Dec 
(cr«5,cai); see §[21j] 

Argument 14-: SEM 
Standard error of the mean of the variation in the phase cor- 
rections resulting from phase self-calibration (<t S em), which 
is used to calculate <Tf ramc ; see § 12.61 

Argument 15: pasbe 
Percentage absolute SB error resulting from calibration 
(AS- ABS ); see §rjU 

Argument 16: pppe 
Percentage peak SB pixellation error (A5* PIX ); see § 12. bl and 
Appendix A. 

Argument 17: cb 
Average clean bias correction (AS 103 0); see § 12.61 

Argument 18: lamfac 
A factor for peak SB bias correction; see § 12.4.11 

Argument 19: visArea 
Option to calculate visibility areas (can increase program 
run-time by more than an order of magnitude) ; see § 12.61 



Arguments 20-22: minpix, maxpix, pixdim 
Minimum and maximum accepted blob sizes in pixels, and 
minimum number of pixels in RA/Dec dimensions for ac- 
cepted blobs (useful for filtering out easily-identified image 
artefacts) . 

Argument 23: edgemin 
Edge buffer in pixels; if flood fill attempts to enter this buffer 
zone, the blob is rejected (and reported to the user). 

Arguments 24-25: write , hfill 
Options to write flooded blobs to an output FITS file and 
to set the blob highlight value; see § 12.71 

Arguments 26-27: kvis , ds9 
Options to write an output kvis or ds9 overlay file; see § 12.71 

Arguments 28-29: plot , plotRng 
Option to produce a diagnostic screen plot displaying 
flooded blobs in the SB image, and additional option to 
specify this plot's dynamic range. 

This paper has been typeset from a TjrjX/ ETffjX file prepared 
by the author. 



